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EXAMINER'S AMENDMENT 

1 . An examiner's amendment to the record appears below. Should the changes 
and/or additions be unacceptable to applicant, an amendment may be filed as provided 
by 37 CFR 1 .31 2. To ensure consideration of such an amendment, it MUST be 
submitted no later than the payment of the issue fee. 

Authorization for this examiner's amendment was given in a telephone interview 
with Tawni Wilhelm (Reg. No. 47,456) on November 16, 2009. 
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Please amend the claims as follows: 

1 . (Currently Amended) A system that facilitates incremental web crawls 
comprising the following components stored in computer memory and executable by a 
processor: 

a selecting component that selects a first chunk from a group that includes at 
least one index chunk that stores information associated with an index of items, at least 
one rank chunk that stores at least one static rank associated with the at least one index 
chunk, at least one content chunk that stores cached copies of contents of pages 
crawled, at least one re-crawl chunk that stores a list of Uniform Resource Locators to 
be re-crawled, and at least one webmap chunk that stores at least a portion of a web 
map used for calculating the at least one static rank; 

a chunk map that stores properties associated with the first chunk and that is 
employed to determine the first chunk based on the stored properties, and wherein the 
stored properties include average time between change and average importance and 
are shared by all items in the first chunk: 

a parsing component that parses the first chunk for Uniform Resource Locators: 

a crawling component that re-crawls the Uniform Resource Locators: 

a receiving component that 

(1) receives one or more documents as a result of re-crawling the uniform 
resource locators. 
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(2) stores a document from the one or more documents in an appropriate 
chunk when it is determined that the document belongs to the appropriate 
chunk, and 

(3) forms a second chunk separate from the first chunk based, at least in 
part, upon the re-crawled uniform resource locators when it is determined 
that the document does not belong to any chunk from the group, and 
storing the document in the second chunk 

on i ndoxorthat p l aces i tems w i th similar propert i es i nto rospoct i vo chunks, 
wh e r ei n th e prop e rt ie s or e shor e d by a ll th e it e ms w i th i n a r e sp e ct i v e chunk, wh e r ei n 
the i tems ore the resu l ts returned by o wob crawl, and where i n the rospoct i vo chunks 
i nc l ud e at le ast on e ronk chunk ond at le ast on e w e bmap chunk; and 

a chunk mop that stores at l east some of tho propert i es assoc i ated w i th oach 
r e sp e ct i v e chunk, wh e r ei n th e prop e rt ie s includ e av e rag e t i m e b e tw ee n chang e and 
average i mportance of documents i n oach rospoct i vo chunk, where i n tho chunk map i s 
e mp l oy e d to fac ili tat e an i ncr e m e nta l w e b r e craw l , and wh e r ei n th e prop e rt ie s of e ach 
rospoct i vo chunk stored i n tho chunk map oro ut ili zed to determ i ne a re craw l of a ll tho 
i t e ms i n that r e sp e ct i v e chunk . 



2. (Previously Presented) The system of claim 1 , wherein the items comprise 
information associated with a Uniform Resource Locator. 
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3. (Previously Presented) The system of claim 1 , wherein the items comprise 
at least one of an HTML file, a PDF file, a PS file, a PPT file, an XLS file and a DOC file. 

4. (Previously Presented) The system of claim 1 , wherein the items are 
received from a crawler, and wherein the crawler is responsible for a specific set of 
Uniform Resource Locators. 

5. (Original) The system of claim 1 , further comprising a master control 
process that can modify the chunk map to facilitate load balancing amongst a plurality of 
crawlers. 

6. (Original) The system of claim 1 , further comprising a master control 
process that serves as an interface between a crawler and a re-crawl controller. 

7. (Previously Presented) The system of claim 6, wherein the master control 
process maintains a known chunks table that stores information for components of the 
system. 

8. (Original) The system of claim 6, wherein the master control process 
exposes an interface for communication with a component of the system. 
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9. (Original) The system of claim 8, wherein the interface returns a list of 
chunks the component should have and where to get the chunks. 

1 0. (Currently Amended) The system of claim 8, wherein the interface returns 
a list of the chunks that should be actively served by the component. 

1 1 . (Original) The system of claim 8, wherein the interface returns a range of 
chunk identifiers to use in building a new chunk by the component. 

12. (Original) The system of claim 8, wherein the interface causes an old 
chunk to be retired by the system. 

1 3. (Original) The system of claim 6, wherein the master control process 
facilitates movement of chunks from one component to another component. 

14. (Previously Presented) The system of claim 1 3, wherein movement of 
chunks is based, at least in part, upon at least one of rebalancing index servers after 
one goes down, re-crawling pages previously crawled, and restoring a state of the 
crawler after it has crashed. 
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1 5. (Currently Amended) The system of claim 1 , furth e r compr i s i ng a r e craw l 
component that wherein the crawling component employs the chunk map to determine 
which chunks, if any, to re-crawl at a particular time. 

16. (Cancelled) 

1 7. (Original) The system of claim 1 , further comprising an index chunk that 
stores information associated with an index of at least some of the items. 

18. (Canceled) 

1 9. (Currently Amended) Computer-readable storage media having computer- 
useable instructions embodied thereon for performing a method of A m e thod of 
perform i ng document re-crawl , the method comprising: 

selecting a first chunk from a group that includes at least one index chunk that 
stores information associated with an index of items , at least one rank chunk that stores 
at least one static rank associated with the at least one index chunk , at least one 
content chunk that stores cached copies of contents of pages crawled , at least one re- 
crawl chunk that stores a list of Uniform Resource Locators to be re-crawled , and at 
least one webmap chunk that stores at least a portion of a web map used for calculating 
the at least one static rank , wherein a chunk map that stores properties associated with 
the first chunk is employed to determine the first chunk based on the stored properties, 
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and wherein the stored properties include average time between change and average 
importance and are shared by all items in the first chunk; 

parsing the first chunk for Uniform Resource Locators; 

re-crawling the Uniform Resource Locators; 

receiving one or more documents as a result of re-crawling the uniform resource 
locators; 

storing a document from the one or more documents in an appropriate chunk 
when it is determined that the document belongs to the appropriate chunk; and, 

forming a second chunk separate from the first chunk based, at least in part, 
upon the re-crawled uniform resource locators when it is determined that the document 
does not belong to any chunk from the group, and storing the document in the second 
chunk. 

20. (Currently Amended) The method media of claim 1 9, wherein the method 
further compr i s i ng comprises at least one of the following acts: 
determining whether any chunks are to be retired; 
moving the first chunk; and 
destroying the first chunk. 



21. (Canceled) 
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22. (Currently Amended) A method of performing document re-crawl 
comprising: 

selecting, utilizing a first computing process, a first chunk from a group that 
includes at least one index chunk that stores information associated with an index of 
items, at least one rank chunk that stores at least one static rank associated with the at 
least one index chunk, at least one content chunk that stores cached copies of contents 
of pages crawled, at least one re-crawl chunk that stores a list of Uniform Resource 
Locators to be re-crawled, and at least one webmap chunk that stores at least a portion 
of a web map used for calculating the at least one static rank, wherein a chunk map that 
stores properties associated with the first chunk is employed to determine the first 
chunk based on the stored properties, and wherein the stored properties include 
average time between change and average importance and are shared bv all items in 
the first chunk: 

parsing, utilizing a second computing process, the first chunk for Uniform 
Resource Locators; 

re-crawling, utilizing a third computing process, the Uniform Resource Locators: 
receiving one or more documents as a result of re-crawling the uniform resource 
locators: 

storing a document from the one or more documents in an appropriate chunk 
when it is determined that the document belongs to the appropriate chunk: and. 

forming, utilizing a fourth computing process, a second chunk separate from the 
first chunk based, at least in part, upon the re-crawled uniform resource locators when it 
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is determined that the document does not belong to any chunk from the group, and 
storing the document in the second chunk, 

wherein the first, second, third and fourth computing processes are performed by 
one or more computing devices 

pars i ng a f i rst chunk for Un i form R e sourc e Locators, wh e r ei n th e Un i form 
Resource Locators arc stored as a resu l t of ono or more wob craw l s; 

access i ng a chunk map compr i s i ng proport i os assoc i ated w i th one or more 
chunks that i nc l ude the f i rst chunk, wherein the stored propert i es are shared by a ll the 
i t e ms i n th e f i rst chunk, and wh e r ei n th e prop e rt ie s i nc l ud e av e rag e t i m e b e tw ee n 
change and average i mportance of documents in the f i rst chunk; and, 

per i od i ca ll y d e t e rm i n i ng, based on th e proport i os of e ach of th e on e or mor e 
chunks i n the chunk map, wheth e r to ro craw l tho i tems i n the f i rst chunk . 

23. (Currently Amended) The method of claim 22, further comprising 
periodically determining whether to re-crawl the Uniform Resource Locators based, at 
least in part, upon at least one of the average time between change and the average 
importance of documents comprising a particular chunk wh e r ei n th e p e r i od i c 
d e t e rm i nat i on i s bas e d, at le ast i n part, upon at le ast on e of av e rag e t i m e b e tw ee n 
chang e and av e rag e i mportanc e of docum e nts compr i s i ng a part i cu l ar chunk . 



24-28. (Canceled) 
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REASONS FOR ALLOWANCE 

2. The following is an examiner's statement of reasons for allowance: Independent 
claim 1 is drawn to a system that facilitates incremental web crawls, independent claim 
19 to computer-readable storage media containing instructions for performing a method 
of document re-crawl, and independent claim 22 to a method of performing document 
re-crawl. The system, computer-readable storage media and method, among other 
things, all utilize a plurality of different chunks and select a chunk from the plurality of 
chunks and re-crawl the URLs in the selected chunk. The chunks include at least one 
index chunk, at least one rank chunk, at least one content chunk, at least one re-crawl 
chunk and at least one webmap chunk. 

The prior art references made of record to not teach, either together or 
separately, a web crawler that selects a first chunk from the group including all of the 
chunks mentioned above in order to re-crawl the URLs within the selected chunk. Nor 
does the prior art of record teach forming a second chunk separate from the first chunk 
based upon the re-crawled URLs when it is determined that the document does not 
belong to any chunk from the group. 

Any comments considered necessary by applicant must be submitted no later 
than the payment of the issue fee and, to avoid processing delays, should preferably 
accompany the issue fee. Such submissions should be clearly labeled "Comments on 
Statement of Reasons for Allowance." 
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Conclusion 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Scott M. Sciacca whose telephone number is (571) 270- 
1 91 9. The examiner can normally be reached on Monday thru Friday, 7:30 A.M. - 5:00 
P.M. EST. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Jeff Pwu can be reached on (571 ) 272-6798. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/Scott M. Sciacca/ 
Examiner, Art Unit 2446 



/Benjamin R Bruckart/ 

Primary Examiner, Art Unit 2446 



